We introduce an algorithm for word-level text spotting that is able toaccurately and reliably determine the bounding regions of individual words oftext "in the wild". Our system is formed by the cascade of two convolutionalneural networks. The first network is fully convolutional and is in charge ofdetecting areas containing text. This results in a very reliable but possiblyinaccurate segmentation of the input image. The second network (inspired by thepopular YOLO architecture) analyzes each segment produced in the first stage,and predicts oriented rectangular regions containing individual words. Nopost-processing (e.g. text line grouping) is necessary. With execution time of450 ms for a 1000-by-560 image on a Titan X GPU, our system achieves thehighest score to date among published algorithms on the ICDAR 2015 IncidentalScene Text dataset benchmark.
展开▼
机译:我们介绍了一种用于单词级文本识别的算法,该算法能够准确,可靠地确定“野外”单个文本单词的边界区域。我们的系统由两个卷积神经网络的级联形成。第一个网络是完全卷积的,负责检测包含文本的区域。这导致输入图像非常可靠但可能不准确的分割。第二个网络(受流行的YOLO架构的启发)分析了第一阶段中产生的每个片段,并预测了包含单个单词的定向矩形区域。无需后期处理(例如,文本行分组)。在Titan X GPU上,对于1000 x 560图像的执行时间为450毫秒,在ICDAR 2015 IncidentalScene Text数据集基准测试中,我们的系统实现了迄今为止发布算法中的最高得分。
展开▼